Speech is now common in daily interactions with our devices, thanks to voice user interfaces (VUIs) like Alexa. Despite their seeming ubiquity, designs often do not match users’ expectations. Science fiction, which is known to influence design of new technologies, has included VUIs for decades. Star Trek: The Next Generation is a prime example of how people envisioned ideal VUIs. Understanding how current VUIs live up to Star Trek’s utopian technologies reveals mismatches between current designs and user expectations, as informed by popular fiction. Combining conversational analysis and VUI user analysis, we study voice interactions with the Enterprise’s computer and compare them to current interactions. Independent of futuristic computing power, we find key design-based differences: Star Trek interactions are brief and functional, not conversational, they are highly multimodal and context-driven, and there is often no spoken computer response. From this, we suggest paths to better align VUIs with user expectations.
Data source: https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-08-17/readme.md
When we interact with voice-command technology, we use certain types of interactions to ‘wake’ the system (“Hey Siri…”), ‘command’ the system (“Play a song on Spotify”), ‘question’ the system (“What is the temperature for today?”), and many other types of interactions.
These interaction types can exist in a chain, such as “Hey Siri, Play a song on Spotify”. However, the primary type of interaction in this phrase is the command to have Siri play a song on Spotify.
On the Starship Enterprise, the crew interacts with the Computer through different primary ‘Interaction Types’. Definitions and examples of these interaction types can be found below.
| Interaction Type | Definition | Examples |
|---|---|---|
| Command | Utterances that directly tell the computer what to do. | Run a diagnostic on the port nacelle. |
| Question | Utterances that ask the computer for something. | Where is Captain Picard? |
| Statement | Utterances tell don’t tell the computer or ask it, but meaning is inferred. | Deck four. I wish to learn about Earth. |
| Password | Utterances that contain a password. | This is Captain Picard. |
| Wake Word | Key phrases used to activate the computer. | Computer. Holodeck. |
| Comment | Utterances that have no intended action for the computer. | Excellent. Ferrazene has a complex molecular structure. |
| Conversation | Utterances that are more like human conversation, such as phatic espressions, formalities, and colloquial speech. | Well, check it again! Then run it for us, dear. |
Because the Computer on the Starship Enterprise can generate objects and display information without responding, it is of interest to examine the proportion of occurrences when the computer responds verbally or non-verbally (which includes through actions only).
The visualizations to the left shows the proportion of verbal versus non-verbal responses, according to interaction type by person. This information can help us understand what types of interactions are more likely to result in verbal or non-verbal responses from the Starship Enterprise Computer.
Via the data visualizations created from proportion tests, we can see that Wake Word, Question, Conversation, and Password interactions are most likely to result in a Verbal response from the Computer, and Statement, Command, Comment interactions were found to result in either Verbal or Non-Verbal Computer response fairly equally. One limitation of this analysis is that sample size for certain combinations of interactions and responses are low.
Data source: http://www.speechinteraction.org/TNG/TeaEarlGreyHotDatasetCodeBook.pdf
NEEDS NUMBER EDITING
When looking at text, something that may come up is how common our choice of words can be. A great way to visualize this idea is with word clouds! A bundle of words with varying size, related to how often that word was used.
This image was created using the spoken lines from all of the characters (except the computer) and each word was individually counted. Interestingly, “program” appears to be the most common word with 193 uses, however, the most used word was “computer” with 1036 uses. Wouldn’t be much of a word cloud when a single word is the cloud. By removing the extreme outlier we were able to make a beautiful image that visualizes the Star Trek speech.
NEEDS EDITING
When looking at text, something that may come up is how common our choice of words can be. A great way to visualize this idea is with word clouds! A bundle of words with varying size, related to how often that word was used.
This image was created using the spoken lines from the computer and each word was individually counted. Interestingly, “program” appears to be the most common word with 193 uses, however, the most used word was “computer” with 1036 uses. Wouldn’t be much of a word cloud when a single word is the cloud. By removing the extreme outlier we were able to make a beautiful image that visualizes the Star Trek speech.
---
title: "Tea, Earl Grey, Hot: Designing Speech Interactions from the Imagined Ideal of Star Trek"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
theme: spacelab
---
```{r setup, include=FALSE}
library(flexdashboard)
library(readr)
library(knitr)
library(tidyverse)
library(purrr)
library(broom)
library(plotly)
library(wordcloud)
library(RColorBrewer)
library(tm)
library(wordcloud2)
startrek <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-17/computer.csv')
```
### Data Description
```{r}
include_graphics('https://raw.githubusercontent.com/LaurS12/ERHS535_Group_Project/main/Images/data_description.png')
#note: this looks like garbage in the markdown file, but if you knit, it shows up correct.
```
***
Speech is now common in daily interactions with our devices, thanks to voice user interfaces (VUIs) like Alexa. Despite their seeming ubiquity, designs often do not match users’ expectations. Science fiction, which is known to influence design of new technologies, has included VUIs for decades. Star Trek: The Next Generation is a prime example of how people envisioned ideal VUIs. Understanding how current VUIs live up to Star Trek’s utopian technologies reveals mismatches between current designs and user expectations, as informed by popular fiction. Combining conversational analysis and VUI user analysis, we study voice interactions with the Enterprise’s computer and compare them to current interactions. Independent of futuristic computing power, we find key design-based differences: Star Trek interactions are brief and functional, not conversational, they are highly multimodal and context-driven, and there is often no spoken computer response. From this, we suggest paths to better align VUIs with user expectations.
Data source: https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-08-17/readme.md
### Chart 1: Lauren
```{r}
```
### Chart 2: Kim
```{r}
```
### Verbal vs. Non-Verbal Computer Responses per Primary Types of Voice Interactions
```{r, results='hide'}
no_comp_voice <- startrek %>%
filter(char != "Computer Voice") %>%
filter(char != "Computer") %>%
filter(char != "Computer (V.O.)") %>%
filter(char != "Computer (V.O)") %>%
filter(char != "Computer Voice (V.O.)") %>%
filter(char != "New Computer Voice") %>%
filter(char != "Com Panel (V.O.)") %>%
filter(char != "Computer'S Voice") %>%
filter(char != "Computer (Voice)") %>%
filter(char != "Computer Voice (Cont'D)")
no_comp_voice <- no_comp_voice %>%
select('pri_type', 'nv_resp')
no_comp_voice$nv_resp <- as.factor(no_comp_voice$nv_resp)
no_comp_voice$pri_type <- as.factor(no_comp_voice$pri_type)
no_comp_voice$nv_resp <- no_comp_voice$nv_resp %>%
recode_factor("TRUE" = "Non-Verbal Response") %>%
recode_factor("FALSE" = "Verbal Response")
levels(no_comp_voice$pri_type)
no_comp_voice <- no_comp_voice %>%
group_by(pri_type, nv_resp) %>%
tally()
no_comp_voice <- no_comp_voice %>%
pivot_wider(names_from = nv_resp, values_from = n)
no_comp_voice[is.na(no_comp_voice)] = 0
no_comp_voice <- no_comp_voice %>%
rename(n_verbal = "Verbal Response") %>%
rename(n_non_verbal = "Non-Verbal Response")
no_comp_voice$total_resp <- no_comp_voice$n_verbal + no_comp_voice$n_non_verbal
no_comp_voice
prop_verbal <- no_comp_voice %>%
mutate(prop_test = purrr::map2(.x= n_verbal,
.y= total_resp,
.f= prop.test))
prop_verbal <- prop_verbal %>%
mutate(prop_tidy = purrr::map(prop_test, ~tidy(.x)))
prop_non_verbal <- no_comp_voice %>%
mutate(prop_test = purrr::map2(.x= n_non_verbal,
.y= total_resp,
.f= prop.test))
prop_non_verbal <- prop_non_verbal %>%
mutate(prop_tidy = purrr::map(prop_test, ~tidy(.x)))
prop_verbal <- prop_verbal%>%
unnest(prop_tidy)
prop_non_verbal <- prop_non_verbal%>%
unnest(prop_tidy)
prop_verbal <- prop_verbal %>%
select(-prop_test)
prop_non_verbal <- prop_non_verbal %>%
select(-prop_test)
prop_verbal <- prop_verbal %>%
select(pri_type, estimate, conf.low, conf.high, n_verbal)
prop_non_verbal <- prop_non_verbal %>%
select(pri_type, estimate, conf.low, conf.high, n_non_verbal)
prop_verbal <- prop_verbal %>%
mutate(estimate = as.numeric(estimate),
conf.low = as.numeric(conf.low),
conf.high = as.numeric(conf.high))
prop_non_verbal <- prop_non_verbal %>%
mutate(estimate = as.numeric(estimate),
conf.low = as.numeric(conf.low),
conf.high = as.numeric(conf.high))
prop_verbal <- prop_verbal %>%
arrange(desc(estimate))
prop_non_verbal <- prop_non_verbal %>%
arrange(desc(estimate))
prop_verbal$resp <- "Verbal"
prop_non_verbal$resp <- "Non-Verbal"
resp_per_int <- rbind(prop_verbal, prop_non_verbal)
resp_per_int[is.na(resp_per_int)] = 0
resp_per_int$n <- resp_per_int$n_non_verbal + resp_per_int$n_verbal
resp_per_int <- resp_per_int %>%
select(-n_non_verbal) %>%
select(-n_verbal)
resp_per_int$resp <- as.factor(resp_per_int$resp)
resp_per_int$pri_type <- factor(resp_per_int$pri_type, levels = c("Password", "Conversation", "Question", "Wake Word", "Comment", "Command", "Statement"))
```
```{r, include=FALSE}
chart_3 <- resp_per_int %>%
ungroup() %>%
ggplot(aes(label=conf.low,
label2=conf.high,
label3=n))+
geom_col(aes(x=estimate, y=pri_type, fill=resp), position="fill")+
labs(title= "Proportions of Computer Response Type",
y= "Person Interaction Type",
x= "Percent of Responses",
subtitle = "Bars show 95% confidence interval",
fill = "")+
scale_x_continuous(labels = scales::percent)+
scale_fill_brewer(palette = "Paired")
theme(plot.title = element_text(hjust = -0.45, vjust=2.12))+
theme_bw()
```
```{r}
ggplotly(chart_3, height=300, width=700)
```
***
When we interact with voice-command technology, we use certain types of interactions to 'wake' the system ("Hey Siri..."), 'command' the system ("Play a song on Spotify"), 'question' the system ("What is the temperature for today?"), and many other types of interactions.
These interaction types can exist in a chain, such as "Hey Siri, Play a song on Spotify". However, the primary type of interaction in this phrase is the command to have Siri play a song on Spotify.
On the Starship Enterprise, the crew interacts with the Computer through different primary 'Interaction Types'. Definitions and examples of these interaction types can be found below.
| Interaction Type | Definition | Examples |
|------------------|-------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| Command | Utterances that directly tell the computer what to do. | Run a diagnostic on the port nacelle. |
| Question | Utterances that ask the computer for something. | Where is Captain Picard? |
| Statement | Utterances tell don't tell the computer or ask it, but meaning is inferred. | Deck four. I wish to learn about Earth. |
| Password | Utterances that contain a password. | This is Captain Picard. |
| Wake Word | Key phrases used to activate the computer. | Computer. Holodeck. |
| Comment | Utterances that have no intended action for the computer. | Excellent. Ferrazene has a complex molecular structure. |
| Conversation | Utterances that are more like human conversation, such as phatic espressions, formalities, and colloquial speech. | Well, check it again! Then run it for us, dear. |
Because the Computer on the Starship Enterprise can generate objects and display information without responding, it is of interest to examine the proportion of occurrences when the computer responds verbally or non-verbally (which includes through actions only).
The visualizations to the left shows the proportion of verbal versus non-verbal responses, according to interaction type by person. This information can help us understand what types of interactions are more likely to result in verbal or non-verbal responses from the Starship Enterprise Computer.
Via the data visualizations created from proportion tests, we can see that Wake Word, Question, Conversation, and Password interactions are most likely to result in a Verbal response from the Computer, and Statement, Command, Comment interactions were found to result in either Verbal or Non-Verbal Computer response fairly equally. One limitation of this analysis is that sample size for certain combinations of interactions and responses are low.
Data source: http://www.speechinteraction.org/TNG/TeaEarlGreyHotDatasetCodeBook.pdf
### Chart 4: Stacey
```{r}
```
***
```{r, results='hide', include=FALSE}
#Courtney data cleaning
comp_voice <- startrek %>%
filter(char == c("Computer Voice", "Computer", "Computer (V.O.)",
"Computer (V.O)", "Computer Voice (V.O.)", "New Computer Voice",
"Com Panel (V.O.)", "Computer'S Voice", "Computer (Voice)",
"Computer Voice (Cont'D)"))
person_voice <- startrek %>%
filter(char != "Computer Voice") %>%
filter(char != "Computer") %>%
filter(char != "Computer (V.O.)") %>%
filter(char != "Computer (V.O)") %>%
filter(char != "Computer Voice (V.O.)") %>%
filter(char != "New Computer Voice") %>%
filter(char != "Com Panel (V.O.)") %>%
filter(char != "Computer'S Voice") %>%
filter(char != "Computer (Voice)") %>%
filter(char != "Computer Voice (Cont'D)")
```
### How common is each word used by the characters?
```{r}
#person lines only
# Filter to necessary column
textperson <- person_voice$interaction
# Clean text
docsperson <- Corpus(VectorSource(textperson))
docsperson <- docsperson %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
docsperson <- tm_map(docsperson, content_transformer(tolower))
docsperson <- tm_map(docsperson, removeWords, stopwords("english"))
# Create matrix with counts
dtmperson <- TermDocumentMatrix(docsperson)
matrixperson <- as.matrix(dtmperson)
wordsperson <- sort(rowSums(matrixperson),decreasing = TRUE)
dfperson <- data.frame(word = names(wordsperson), freq = wordsperson)
# Wordcloud
wordcloud2(data = dfperson, size = 2, color= "random-light", shape = "circle", backgroundColor = "black")
```
***
NEEDS NUMBER EDITING
When looking at text, something that may come up is how common our choice of words can be. A great way to visualize this idea is with word clouds! A bundle of words with varying size, related to how often that word was used.
This image was created using the spoken lines from all of the characters (except the computer) and each word was individually counted. Interestingly, "program" appears to be the most common word with 193 uses, however, the most used word was "computer" with 1036 uses. Wouldn't be much of a word cloud when a single word is the cloud. By removing the extreme outlier we were able to make a beautiful image that visualizes the Star Trek speech.
### How common is each word used by the computer?
```{r}
# Packages (moved to top)
#computer lines only
# Filter to necessary column
textcomp <- comp_voice$interaction
# Clean text
docscomp <- Corpus(VectorSource(textcomp))
docscomp <- docscomp %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
docscomp <- tm_map(docscomp, content_transformer(tolower))
docscomp <- tm_map(docscomp, removeWords, stopwords("english"))
# Create matrix with counts
dtmcomp <- TermDocumentMatrix(docscomp)
matrixcomp <- as.matrix(dtmcomp)
wordscomp <- sort(rowSums(matrixcomp),decreasing = TRUE)
dfcomp <- data.frame(word = names(wordscomp), freq = wordscomp)
# Wordcloud
compcloud<-wordcloud2(data = dfcomp, size = 0.5, color= "random-light", shape= "circle", backgroundColor = "black")
library(htmlwidgets)
webshot::install_phantomjs()
saveWidget(compcloud,"1.html",selfcontained = F)
webshot::webshot("1.html","1.png",vwidth = 700, vheight = 500, delay =10)
```
***
NEEDS EDITING
When looking at text, something that may come up is how common our choice of words can be. A great way to visualize this idea is with word clouds! A bundle of words with varying size, related to how often that word was used.
This image was created using the spoken lines from the computer and each word was individually counted. Interestingly, "program" appears to be the most common word with 193 uses, however, the most used word was "computer" with 1036 uses. Wouldn't be much of a word cloud when a single word is the cloud. By removing the extreme outlier we were able to make a beautiful image that visualizes the Star Trek speech.